Importing Libraries to build Ensemble Machine Learning Algorithms

File Reading

Data Pre-Processing

Let's first consider the features which may influence the model.

Checking the distributions, skewness and Inter Quartile Range to fill the null values in the data

Filling the null values with mean after checking the distributions

Exploratory Data Analysis

Observations England has the highest number of players in FIFA 21 game. One of the major reasons in this regards is due to the EA franchise, which has predominatingly most user base in UK. Also in FIFA, English League has the most number of teams- generating the most number of players

England and Brazil are the teams that deserve a mention in this aspect. England since it has produced 1856 players, and still is having an average of 63.28, while Brazil has the highest average Ratings among the players

Bolton Wanderers with 48 players, Chelsea, Manchester United and AS Monaco have 45 players information for all the 3 clubs. Just a general observation, the average count of players for the English Premier League is more than any other leage. This shows the prioritization of English football by FIFA

As per the above chart, two teams deserve a special mention in this regard. Firstly it is Bayern Munich- The team which has the highest average rating among all the teams (81.46) from a set of 26 players. Another team is Real Madrid- which has the highest average among the teams with 45 players. They have an average of 79.06 on the 33 players

Generally for a healthy football player, the height and weight are in a proportion. Else he/she will be too weak/heavy- and not have the peak fitness form. This is seen from the above scatter plot.

The most number of player population is for the Striker, which is followed by Center Back Position and The goal keeper positions.

The chart looks like a normal curve which is left skewed. On an average 20-24 is the average age for most of the footballers.

Lionel Messi tops the board and then Cristiano Ronaldo. Among the youngest players, Kylian Mbappe, Jadon Snacho and Trent Alexander Arnold deserve a special mention.

The Potential cannot be equal to the Overall Score, and the age of the players must be smaller than 25

Representation of players with their age, position and potential

Correlation plot

Feature Selection

Label Encoding

Feature selection using selectkBest technique

Dataframe creation after feature selection

Splitting the dataframe for train and test validation approach

Model Building

Model Building using 4 different Ensemble Machine Learning Models like Support Vector Classifier, Logisitic Regressor Classifier, Random Forest Classifier, and Decision Tree Classifier which works on bagging and boosting approach and will obviously helps in improving the model performance when compared to all the other traditional models

Support Vector Classifier

SVC Base Model

Hyperparameter Tuning of the SVC Model

Printing the Best Hyperparameter values

Tuned Model for SVC

Logistic Regression Classifier

Logisitic Regression Classifier Base Model

Hyperparameter Tuning of the Logistic Regression Classifier

Printing the Best Hyperparameter values

Tuned Model for Logistic Regression Classifier

Random Forest Classifier

Random Forest Classifier Base Model

Hyperparameter Tuning of the Random Forest Classifier

Printing the Best Hyperparameter Values

Tuned Model for Random Forest Classifier

Decision Tree Classifier

Decision Tree Classifier Base Model

Hyperparameter Tuning of the Decision Tree Classifier

Printing the Best Hyperparameter Values

Tuned Model for Decision Tree Classifier

Random Forest Classifier Base Model Evaluation Metrics

Random Forest Classifier Base Model Confusion Matrix

Random Forest Classifier Base Model Heat Map

Random Forest Classifier Tuned Model Evaluation Metrics

Random Forest Classifier Tuned Model Confusion Matrix

Random Forest Classifier Tuned Model Heat Map


Model Summary

Model Summary for 4 Major Positions Classifier Base Model

Evaluation Summary Table for 4 Major Positions Classifier Base Model

Evaluation Summary Graph for 4 Major Positions Classifier Base Model

Model Summary for 4 Major Positions Classifier Tuned Model

Evaluation Summary Table for 4 Major Positions Classifier Tuned Model

Evaluation Summary Graph for 4 Major Positions Classifier Tuned Model

Random Forest Classifier performs better with above Both Base and Tuned models evaluation metrics when compared to all other 3 models in predicting the major 4 position of the player